KeyVec: Key-semantics Preserving Document Representations

نویسندگان

  • Bin Bi
  • Hao Ma
چکیده

Previous studies have demonstrated the empirical success of word embeddings in various applications. In this paper, we investigate the problem of learning distributed representations for text documents which many machine learning algorithms take as input for a number of NLP tasks. We propose a neural network model, KEYVEC, which learns document representations with the goal of preserving key semantics of the input text. It enables the learned low-dimensional vectors to retain the topics and important information from the documents that will flow to downstream tasks. Our empirical evaluations show the superior quality of KEYVEC representations in two different document understanding tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Category Theory as a Foundation for Document Processing

Documents, particularly electronic documents that are created, disseminated , and used with computers, have several representations. Users may wish to work with such electronic documents in any of a document's representations, and this can make it diicult to maintain consistency between the diierent representations of a document. Category theory provides insight into this problem. We begin by d...

متن کامل

Integrating Structure and Meaning: A New Method for Encoding Structure for Text Classification

Current representation schemes for automatic text classification treat documents as syntactically unstructured collections of words or ‘concepts’. Past attempts to encode syntactic structure have treated part-of-speech information as another word-like feature, but have been shown to be less effective than non-structural approaches. We propose a new representation scheme using Holographic Reduce...

متن کامل

Relational Semantics for Flow Graph Representations as Basis for Transformational Design of Digital Systems

Transformational design is a promising design methodology which combines correctness by construction and interactive design. In this design methodology the design steps are behaviour preserving transformations of one design representation into another. The representations used in transformational design need to have formal semantical models in order to prove the correctness, the behaviour prese...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Meaningfulness of Religious Language in the Light of Conceptual Metaphorical Use of Image Schema: A Cognitive Semantic Approach

According to modern religious studies, religions are rooted in certain metaphorical representations, so they are metaphorical in nature. This article aims to show, first, how conceptual metaphors employ image schemas to make our language meaningful, and then to assert that image-schematic structure of religious expressions, by which religious metaphors conceptualize abstract meanings, is the ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.09749  شماره 

صفحات  -

تاریخ انتشار 2017